Enhanced Ingestion with Named Graph Restrictions #19

tekrajchhetri · 2025-02-21T20:57:19Z

Enhanced Ingestion with Named Graph Restrictions

Features Implemented

JSON-LD Ingestion: Supports ingestion of single or multiple JSON-LD files.
Turtle Ingestion: Enables ingestion of single or multiple Turtle files.
Restricted Ingestion: Ensures that data is only ingested into pre-registered named graphs, preventing unintended ingestion of arbitrary data.
PROV Metadata Tracking:
- Captures provenance data for each ingestion activity.
- Records metadata about the named graphs.
- Uses the PROV ontology and DCTERMS (Dublin Core) for recording metadata and provenance.

…sage from the publisher.

…activity

…, registered.

…d graph.

aaronkanzer · 2025-04-15T20:20:05Z

ingestion_service/producer/core/routers/api_endpoints_input.py

 ):
    if not files:
        raise HTTPException(status_code=400, detail="No files provided")
+    named_graph_iri = named_graph_exists(graph)
+    if named_graph_iri["status"] != True:


What does this mean if the status is not True? Should the HTTP response back not be a 200?

aaronkanzer · 2025-04-15T20:20:34Z

ingestion_service/producer/core/routers/api_endpoints_input.py

@@ -378,7 +414,8 @@ async def ingest_knowledge_graphs_batch(


 @router.post("/upload/document", summary="Ingest a either TXT, JSON and PDF files",
-             dependencies=[Depends(require_scopes(["write"]))]
+             dependencies=[Depends(require_scopes(["write"]))],


Do you also want to include the LoggedIn annotation kwarg that you've been including elsewhere?

yes, I want to include it.

aaronkanzer · 2025-04-15T20:20:50Z

ingestion_service/producer/core/routers/api_endpoints_input.py

            "user": posting_user,
            "filename": file.filename,
            "extension": file.filename.split('.')[-1].lower()
        })


-
 @router.post("/upload/documents",


Similar to https://github.com/sensein/BrainKB/pull/19/files#r2045412497

aaronkanzer · 2025-04-15T20:21:55Z

ingestion_service/producer/core/shared.py

+        }
+
+    try:
+        response = requests.get(endpoint)


requests is a synchronous call -- is this function eventually being called in an upstream async call?

aaronkanzer · 2025-04-15T20:29:57Z

ingestion_service/producer/core/shared.py

+        return {
+            "status": "error",
+            "message": f"Error connecting to query service: {str(e)}"
+        }


Should this have an HTTP status code indicating it is an error returned as well?

aaronkanzer · 2025-04-15T20:31:23Z

ingestion_service/worker/core/main.py


+logger = logging.getLogger(__name__)

 async def background_task():


Should this also be apart of the @app.on_event("startup")?

aaronkanzer · 2025-04-15T20:33:55Z

ml_service/README.md

+  - RABBITMQ_URL, i.e., the hostname, by default it is localhost
+  - RABBITMQ_PORT, by default 5672 is used
+  - RABBITMQ_VHOST, default vhost is "/"
+  - 


Did you mean to add something here regarding the ingestion URL?

…mq + prometheus + grafana

…ther than json string

tekrajchhetri added 21 commits January 30, 2025 12:38

updated shared.py to add new helper function convert_ttl_to_named_graph

096e809

Updated RabbitMQ listener to handle, i.e., consume, the published mes…

816e1d9

…sage from the publisher.

new functions added to attach provenance information about ingestion …

2f109c7

…activity

checks added to ensure required parameters are present

fedb7b7

namespace updated from test example.org to brainkb.org

ad7f206

update the test connection method to handle empty result for success

3a9413c

endpoint that handles insertion of data to graphdatabase

08f9ca2

new endpoint added

bc8d263

removed /query/insert-jsonld endpoint

2ec9f6f

updated pydantic schema.

b27e37f

code update to end-to-end raw json-ld data ingestion.

c645a04

query service updated to fetch the registered named graphs

ec5fe1f

shared.py updated to add the metadata about the named graph

d322e35

NamedGraphSchema added

c66d071

endpoint renamed and new endpoint to register the named graph added.

047b3f7

fixed issue - unable to upload multiple json-ld/turtle files

136ea9d

logger added

f261d7e

updated configuration.py to read QUERY_SERVICE_BASE_URL for ingestion.

98f033a

updated pydantic schema for named graphs

d380973

new function added to check if the targeted named graphs exists, i.e.…

331300d

…, registered.

code updated restrict ingestion of ttl/jsonld (raw+file) to registere…

6edb19d

…d graph.

tekrajchhetri requested a review from aaronkanzer February 21, 2025 20:57

tekrajchhetri added 8 commits March 3, 2025 08:56

updated listener to ack the delivery on successful ingestion to db

ac30ae2

init ml service

5d94afb

Update .gitignore

a5d58f5

updated requirements to fix dependency conflict issue

3276a8f

pydantic models for agents, tasks..

9efbe07

crew_memory added

fa40552

new endpoints added

3204684

added parse_yaml_or_json to handle the input

3f51ceb

aaronkanzer reviewed Apr 15, 2025

View reviewed changes

tekrajchhetri added 23 commits May 9, 2025 15:11

updated structsense + disabled API human feedback

acc27fc

CORSMiddleware added

ff104b5

Update requirements.txt

4ec473f

JWT security added

438e5fc

Update jwt_auth.py

1a0a92f

removed /

162726c

rdflib PyLD added

f3c1ade

removed / from register-named-graph endpoint to maintain consistency

e49d26c

updated listener to handle error and reconnect automatically

ab412ae

Create .gitignore

0624dcf

example .env

49f9d6f

screenshots

7dab3a0

custom configuration for rabbitmq for large frame size

efd76ac

prometheus configuration

af0be6c

updated docker compose to include the prometheus + grafana

ea91562

created readme with detail instructions on deploying about the rabbit…

8f94d8f

…mq + prometheus + grafana

Update readme.md

cb62036

removed prov:wasInformedBy attached to the original triple

dbfe977

new endpoint added for kg file upload + json based ingestion

780cb6b

endpoint renamed

1e18f77

updated code to support json file format for resource (e.g., bbqs) ra…

b6f837e

…ther than json string

updated to use named-graph iri

481c79e

Namespace issue fixed

4560602

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhanced Ingestion with Named Graph Restrictions #19

Enhanced Ingestion with Named Graph Restrictions #19

Uh oh!

tekrajchhetri commented Feb 21, 2025 •

edited

Loading

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

tekrajchhetri Apr 15, 2025

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

aaronkanzer Apr 15, 2025

Uh oh!

Uh oh!


		logger = logging.getLogger(__name__)

		async def background_task():

Enhanced Ingestion with Named Graph Restrictions #19

Are you sure you want to change the base?

Enhanced Ingestion with Named Graph Restrictions #19

Uh oh!

Conversation

tekrajchhetri commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enhanced Ingestion with Named Graph Restrictions

Features Implemented

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tekrajchhetri commented Feb 21, 2025 •

edited

Loading